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ABSTRACT 



Statistical clustering is critical in designing scalable 
image retrieval systems. This paper presents a scalable algorithm for 
indexing and retrieving images based on region segmentation. The method uses 
statistical clustering on region features and IRM (Integrated Region 
Matching) , a measure developed to evaluate overall similarity between images 
that incorporates properties of all the regions in the images by a 
region-matching scheme. Compared with retrieval based on individual regions, 
this overall similarity approach: (l) reduces the influence of inaccurate 

segmentation; (2) helps to clarify the semantics of a particular region; and 
(3) enables a simple querying interface for region-based image retrieval 
systems . The algorithm has been implemented as part of an experimental 
SIMPLIcity image retrieval system and tested on large-scale image databases 
of both general -purpose images and pathology slides. Experiments have 
demonstrated that this technique maintains the accuracy and robustness of the 
original system while reducing the matching time significantly. (Contains 41 
references . ) (Author) 
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ABSTRACT 

Statistical clustering is critical in designing scalable image 
retrieval systems. In this paper, we present a scalable al- 
gorithm for indexing and retrieving images based on re- 
gion segmentation. The method uses statistical clustering 
on region features and IRM (Integrated Region Matching), 
a measure developed to evaluate overall similarity between 
images that incorporates properties of all the regions in the 
images by a region-matching scheme. Compared with re- 
trieval based on individual regions, our overall similarity ap- 
proach (a) reduces the influence of inaccurate segmentation, 

(b) helps to clarify the semantics of a peurticuleu: region, and 

(c) enables a simple querying interface for region-based im- 
age retrieval systems. The algorithm has been implemented 
as a part of our experimental SIMPLIcity image retrieval 
system and tested on large-scale image databases of both 
general-purpose images and pathology slides. Experiments 
have demonstrated that this technique maintains the accu- 
racy and robustness of the original system while reducing 
the matching time significantly. 

Keywords 

Content-based image retrieval, wavelets, clustering, segmen- 
tation, integrated region matching. 

1. INTRODUCTION 



*An on-line demonstration is provided at URL: 
http : //wang . ist , psu . edu 

^Also of DepEurtment of Computer Science and Engineering 
and e-Business Reseetrch Center. Research started when the 
author was with the Departments of Biomedical Informatics 
and Computer Science at Stanford University. 

^Also of Department of Electrical Engineering, . Now with 
Cisco Systems, Inc. 



Permission to make digital or hard copies of all or part of this work for 
personal or classroom use is granted without fee provided that copies are 
not made or distributed for profit or commercial advantage and that copies 
bear this notice and the full citation on the first page. To copy otherwise, to 
republish, to post on servers or to redistribute to lists, requires prior specific 
permission and/or a fee. 

JCDL‘01, June 24-28, 2001, Roanoke, Virginia, USA. 

Copyright 2001 ACM 1-58113-345-6/01/0006 ...$5.00. 



As multimedia information bases, such as the Web, be- 
come larger and larger in size, scalability of information 
retrieval system has become increasingly important. Ac- 
cording to a report published by Inktomi Corporation and 
NEC Research in JanuEury 2000 [13], there are about 5 mil- 
lion unique Web sites (± 3%) on the Internet. Over one 
billion web pages (± 35%) can be down-loaded from these 
Web sites. Approximately one billion images can be found 
on-line. SeEurching for information on the Web is a serious 
problem [16, 17]. Moreover, the current growth rate of the 
Web is exponential, at an amazing 50% annual rate. 

1.1 Image retrieval 

Content-based image retrieval is the retrieval of relevant 
images from an image database based on automatically de- 
rived features. The need for efficient content-based image 
retrieval has increased tremendously in many application 
Eureas such as biomedicine, crime prevention, the military, 
commerce, culture, education, entertainment, and Web im- 
age classification and searching. 

Content-based image retrieval has been widely studied. 
Space limitations do not allow us to present a broad survey. 
Instead we try to emphasize some of the work that is most 
related to what we propose. The references below are to be 
taken as examples of related work, not as the complete list 
of work in the cited Eirea. 

In the commercial domain, IBM QBIC [8, 25] is one of 
the eEirliest developed systems. Recently, additional sys- 
tems have been developed at IBM T.J. Watson [34], VI- 
RAGE [10], NEC C&C Research Labs [23], Bell Labora- 
tory [24], Interpix (Yahoo), Excalibur, and Scour.net. 

In academia, MIT Photobook [26, 27] is one of the esir- 
liest. Berkeley Blobworld [5], Columbia VisualSEEK and 
WebSEEK [33], CMU Informedia [35], University of Illinois 
MARS [22], University of California at Santa Barbara Ne- 
Tra [21], the system developed by University of California 
at San Diego [14], Stanford WBIIS [36], and Stanford SIM- 
PLIcity [38, 40]) are some of the recent systems. 

Many indexing and retrieval methods have been used in 
these image retrieval systems. Some systems use keywords 
and full-text descriptions to index images. Others used fea- 
tures such as color histogram, color layout, local texture, 
wavelet coefficients, and shape to index images. In this pa- 
per, we focus on region-based retrieval of images. 
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1:2 Region-based retrieval 
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regions 
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Figure 1: Query procedure of the Blobworld system 
developed at the University of California, Berkeley. 



Figure 2: Query interface of the NeTra system de- 
veloped at the University of California, Santa Bar- 
bara. 



Before the introduction o^ region-based systems, content- 
based image retrieval systems used color histogreun and color 
layout to index the content of images. Region-based ap- 
proach has recently become a populcir research trend. 
Region-based retrieval systems attempt to overcome the de- 
ficiencies of color histogram and color layout search by repre- 
senting images at the object-level. A region-based retrieval 
system applies image segmentation to decompose an image 
into regions, which correspond to objects if the decompo- 
sition is ideal. The object-level representation is intended 
to be close to the perception of the human visual system 
(HVS). 

Many earlier region-based retrieval systems match images 
based on individual regions. Such systems include the Ne- 
tra system [21] and the Blobworld system [5]. Figures 1 
and 2 show the querying interfaces of Blobworld and Netra. 
Querying based on a limited number of regions is allowed. 
The query is performed by merging single-region query re- 
sults. The motivation is to shift part of the comparison task 
to the users. To query an image, a user is provided with the 
segmented regions of the image, and is required to select 
the regions to be matched and also attributes, e.g., color 
and texture, of the regions to be used for evaluating simi- 
larity. Such querying systems provide more control for the 
users. However, the query formulation process can be very 
time consuming. 

1.3 Integrated region-based retrieval 

Researchers are developing similarity measures that com- 
bine information from all of the regions. One effort in this 
direction is the querying system developed by Smith and 
Li [34]. Their system decomposes an image into regions 
with characterizations pre-defined in a finite pattern library. 
With every pattern labeled by a symbol, images are then 
represented by region strings. Region strings are converted 
to composite region template (CRT) descriptor matrices that 
provide the relative ordering of symbols. Similarity between 
images is measured by the closeness between the CRT de- 
scriptor matrices. This measure is sensitive to object shift- 
ing since a CRT matrix is determined solely by the order- 
ing of symbols. Robustness to scaling and rotation is also 



not considered by the measure. Because the definition of 
the CRT descriptor matrix relies on the pattern library, the 
system performance depends critically on the library. The 
performance degrades if region types in an image are not 
represented by patterns in the library. The system in [34] 
uses a CRT library with patterns described only by color. 
In particular, the patterns are obtained by quantizing color 
space. If texture and shape features are used to distinguish 
patterns, the number of patterns in the library will increase 
dramatically, roughly exponentially in the number of fea- 
tures if patterns axe obtained by uniformly quantizing fea- 
tures. 

Li et al. of Stanford University recently developed SIM- 
PLIcity (Semantics-sensitive Integrated Matching for Pic- 
ture Libraries) [37]. SIMPLIcity uses semantics type clas- 
sification and an integrated region matching (IRM) scheme 
to provide efficient and robust region-based image match- 
ing [18]. The IRM measure is a similarity measure of images 
based on region representations. It incorporates the proper- 
ties of all the segmented regions so that information about 
an image can be fully used. With IRM, region-based image- 
to-image matching can be performed. The overall similarity 
approach reduces the adverse effect of inaccurate segmen- 
tation, helps to clarify the semantics of a particular region, 
and enables a simple querying interface for region-based im- 
age retrieval systems. Experiments have shown that IRM 
is comparatively more effective and more robust than many 
existing retrieval methods. Like other region-based systems, 
the SIMPLIcity system is a linear matching system. To per- 
form a query, the system compares the query image with all 
images in the same semantic class. 

1.4 Statistical clustering 

There are many efforts made to statistically cluster the 
high dimensional feature space before the actual search- 
ing using various tree structures such as K-D-B-tree [28], 
quadtree [9] R-tree [11], i^'^-tree [31], i^*-tree [1], X-tree [3], 
SR- tree [15], M-tree [6], TV- tree [19], and hB-tree [20]. As 
mentioned in [4, 2, 1, 15, 41], the speed and accuracy of 
these algorithms degrade in very high dimensional spaces. 
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This is referred to as the curse of dimensionality. Besides, 
many of the clustering and indexing algorithms are designed 
for general purpose feature spaces such as Euclidean space. 
We developed our own algorithm for clustering and index- 
ing image databases because we wanted the system to be 
suitable to our IRM region matching scheme. 

1.5 Overview 

In this paper, we present an enhancement to the SIMPLIc- 
ity system for handling image librEiries with million of im- 
ages. The targeted applications include Web image retrieval 
and biomedical image retrieval. Region features of images 
in the same semantic class are clustered automatically us- 
ing a statistical clustering method. Features in the same 
cluster Eire stored in the same file for efficient access during 
the matching process. IRM (Integrated Region Matching) 
is used in the query matching process. Tested on leirge-scale 
image databases, the system has demonstrated high accu- 
racy, robustness, and scalability. 

The remainder of the paper is organized as follows. In Sec- 
tion 2, the similarity matching process based on segmented 
regions is defined. In Section 3, we describe the experiments 
we performed and provide results. We discuss limitations of 
the system in Section 4. We conclude in Section 5. 

2. THE SIMILARITY MEASURE 

In this section, we describe the similarity matching pro- 
cess we developed. We briefly describe the segmentation 
process and related notations in Section 2.1. The feature 
space analysis process is described in Section 2.2. In Sec- 
tion 2.3, we give details of the matching scheme. 

2.1 Region segmentation 

Semantically-precise image segmentation is extremely dif- 
ficult and is still an open problem in computer vision [32, 
39]. We attempt to develop a robust matching metric that 
can reduce the adverse effect of inaccurate segmentation. 
The segmentation process in our system is very efficient be- 
cause it is essentially a wavelet-based fast statistical cluster- 
ing process on blocks of pixels. 

To segment an image, we pEirtitions the image into blocks 
with txt pixels and extracts a feature vector for each block. 
The k-means algorithm is used to cluster the feature vectors 
into several classes with every class corresponding to one 
region in the segmented image. We dynamically determine 
k by starting with k — 2 and refine if necessEiry to fc — 4, 
etc. k is dynamically determined based on the complexity of 
the image. We do not require the clusters to be locationally 
contiguous because we rely on a robust matching process. 
The details of the segmentation process is described in [18]. 

Six features axe used for segmentation. Three of them are 
the average color components in a t x t block. The other 
three represent energy in high frequency bands of wavelet 
transforms [7], that is, the square root of the second or- 
der moment of wavelet coefficients in high frequency bands. 
We use the well-known LUV color space, where L encodes 
luminance, and U and V encode color information (chromi- 
nance). The LUV color space has good perception correla- 
tion properties. We chose the block size t to be 4 to compro- 
mise between the texture detail and the computation time. 

Let N denote the total number of images in the image 
database. For the i-th image, denoted as Ri, in the database, 
we obtain a set of m feature vectors after the region segmen- 



tation process. Each of these rn d-dimensional feature vec- 
tors represents the dominant visual features (including color 
and texture) of a region, the shape of that region, the rough 
location in the image, and some statistics of the features 
obtained in that region. 

2.2 Feature space analysis 

The new integrated region matching scheme depends on 
the entire picture library. We must first process and analyze 
the characteristics of the d- dimensional feature space. 

Suppose feature vectors in the d-dimensional feature space 
are {xt : i — 1, ..., L}, where L is the total number of regions 
in the picture library. Then L — rii. 

The goal of the feature clustering algorithm is to partition 
the features into k groups with centroids Xi, X 2 , such 
that 

L 

D(k) = ( 1 ) 

tr=l — — 

is minimized. That is, the average distance between a fea- 
ture vector and the group with the nearest centroid to it is 
minimized. Two necessary conditions for the k groups are: 

1. Each feature vector is partitioned into the cluster with 
the nearest centroid to it. 

2. The centroid of a cluster is the vector minimizing the 
average distcince from it to any feature vector in the 
cluster. In the special case of the Euclidean distance, 
the centroid should be the mean vector of all the fea- 
ture vectors in the cluster. 

These requirements of our feature grouping process are 
the same requirements as those of the Lloyd algorithm [12] 
to find k cluster means with the following steps: 

1. Initialization: choose the initial k cluster centroids. 

2. Loop until the stopping criterion is met: 

(a) For each feature vector in the data set, assign it 
to a class such that the distance from this feature 
to the centroid of that cluster is minimized. 

(b) For each cluster, recalculate its centroid as the 
mean of all the feature vectors partitioned to it. 

If the Euclidean distance is used, the k-means algorithm 
results in hyper- planes as cluster boundaries (Figure 3. That 
is, for the feature space R^, the cluster boundaries are hyper- 
planes in the d — 1 dimensional space 

Both the initialization process and the stopping criterion 
are critical in the process. We initialize the algorithm adap- 
tively by choosing the number of clusters k by gradually in- 
creasing k and stop when a criterion is met. We start with 
k — 2. The k-means algorithm terminates when no more 
feature vectors are changing classes. It can be proved that 
the k-means algorithm is guaranteed to terminate, based on 
the fact that both steps of k-means (i.e., assigning vectors 
to nearest centroids and computing cluster centroids) reduce 
the average class variance. In practice, running to comple- 
tion may require a large number of iterations. The cost for 
each iteration is O(fcn), for the data size n. Our stopping 
criterion is to stop after the average class variance is smaller 
than a threshold or after the reduction of the class variance 
is smaller than a threshold. 
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Figure 3: The k-means algorithm partitions the feature space using hyper-planes. 



2.3 Image matching 

To retrieve similar images for a query image, we first lo- 
cate the clusters of the feature space to which the regions of 
the query image belong. Let’s assume that the centroids of 
the set of k clusters are {ci , C 2 , c*;}. We assume the query 
image is represented by region sets Ri = {ri, r 2 , rm}, 
where r» is the descriptor of region i. For each region fea- 
ture ri, we find j such that 

d(ri,Cj) = min d{n, Cl) 

1 

where d(ri,r 2 ) is the region-to-region distance defined for 
the system. This distance can be a non-Euclidean distance. 
We create a list of clusters, denoted as {cn , Crs, . Crjt }• 
The matching algorithm will further investigate only these 
‘suspect’ clusters to emswer the query. 

With the list of ‘suspect’ clusters, we create a list of ‘sus- 
pect’ images. An image in the database is a ‘suspect’ image 
to the query if the image contains at least one region feature 
in these ‘suspect’ clusters. This step can be accomplished 
by merging the cluster image IDs non-rep eatedly. 

To define the similarity measure between two sets of re- 
gions, we assume that the image and image R 2 are 
represented by region sets = {n, r 2 , ..., rm} and R 2 = 
{^ 1 )^ 2 ) where r, or rj is the descriptor of region i. 
Denote the distance between region r, and r' as d(ri,rj), 
which is written as dij in short. To compute the similarity 
measure between region sets Ri and i^ 2 , d(i?i,i^ 2 ), we first 
compute all pair-wise region-to-region distances in the two 
images. Our matching scheme aims at building correspon- 
dence between regions that is consistent with our perception. 
To increase robustness against segmentation errors, we allow 
a region to be matched to several regions in another image. 
A matching between ri and is assigned with a significance 
credit Stj, Sij > 0. The significance credit indicates the im- 
portance of the matching for determining similarity between 
images. The matrix S = {stj}, 1 < i < n, 1 < j < m, is 
referred to as the significance matrix. 

The distance between the two region sets is the summation 
of all the weighted matching strength, i.e., 

diRM(Rii R 2 ) = ^ Sijdij . 

This distance is the integrated region matching (IRM) dis- 
tance defined by Li et al. in [18]. 



To choose the significance matrix 5, a natural issue to 
raise is what constraints should be put on St j so that the ad- 
missible matching yields good similarity measure. In other 
words, what properties do we expect an admissible match- 
ing to possess? The first property we want to enforce is the 
fulfillment of significance. Assume that the significance of 
ri in Image 1 is pt, and r'- in Image 2 is p^-, we require that 

n 

= Pi,i = l,...,m 

j=i 

m 

= Pj, j = h-,n. 

t = l 

A greedy scheme is developed to speed up the determina- 
tion of the matrix S = {stj }. Details of the algorithm can 
be found in [18]. 

2.4 The RF*IPF weighting 

For applications such as biomedical image retrieval, local 
feature is critically important in distinguishing the seman- 
tics between two images. In this section, we present the 
Region Frequency and Inverse Picture Frequency (RF*IPF) 
weighting, a relatively simple weighting measure developed 
to further enhance the discriminating efficiency of IRM based 
on the characteristics of the entire picture library. This 
weighting can be used to emphasize uncommon features. 

The definition of RF*IPF is in some way close to the defi- 
nition of the Term Frequency and Inverse Document Fre- 
quency (TF*IDF) weighting [30], a highly effective tech- 
niques in document retrieval. The combination of RF*IPF 
and IRM is more effective than the IRM itself in a variety 
of image retrieval applications. Additionally, this weight- 
ing measure provides a better unification of content-based 
image retrieval and text-based image retrieval. 

The RF*IPF weighting consists of two parameters: the 
Region Frequency (RF) and the Inverse Picture Frequency 
(IPF). 

For each region feature vector Xi of the image Rj^ we 
find the closest group centroid from the list of k centroids 
computed in the feature analysis step. That is, we find co 
such that 

II - Xco 11= min II Xi - ic II • (2) 

1 <c<k 

Let’s denote Ncq as the number of pictures in the database 



with at least one region feature closest to the centroid Xcq 
of the image group co- Then we define 

H) * ' <*> 

where IPFi is the Inverse Picture Frequency of the feature 

Xi. 

Now let’s denote Mj as the total number of pixels in the 
image Rj. For images in a size- normalized picture library, 
Mj are constants for all j. Denote Fij as the area percentage 
of the region i in the image Rj . Then, we define 

RFij = log{PijMj) -f- 1 (4) 

as the Region Frequency of the i-th region in picture j. Then 
RF measures how frequently a region feature occurs in a 
picture. 

We can now assign a weight for each region feature in each 
picture. The RF*IPF weight for the i-th region in the j^th. 
image Rj is defined as 

Wij = RFij * IPFi . (5) 

Clearly, the definition is close to that of the TF*IDF (Term 
Frequency times Inverse Document Frequency) weighting in 
text retrieval. 

After computing the RF*IPF weights for all the L regions 
in all the N images in the image database, we store these 
weights for the image matching process. 

We now combine the IRM distance with the RF*IPF 
weighting in the process of choosing the significance ma- 
trix S. A natural issue to raise is what constraints should 
be put on Sij so that the admissible matching yields good 
similarity measure. In other words, what properties do we 
expect an admissible matching to possess? The first prop- 
erty we want to enforce is the fulfillment of significance. We 
computed the significance Wx,r^ of r, in image Ri and vj in 
image R 2 is we require that 

2^Si,j = Pi = , I = 

E,=i Wi,R, 

3 . EXPERIMENTS 

This algorithm has been implemented and compared with 
the first version of our experimental SiMPLIcity image re- 
trieval system. We tested the system on a general-purpose 
image database (from COREL) including about 200, 000 pic- 
tures, which are stored in JPEG format with size 384 x 256 
or 256 X 384. To conduct a fair compaxison, we use only 
picture features in the retrieval process. 

3.1 Speed 

On a Pentium III 800MHz PC using the Linux operating 
system, it requires approximately 60 hours to compute the 
feature vectors for the 200, 000 color images of size 384 x 256 
in our general-purpose image database. On average, one 
second is needed to segment an image and to compute the 
features of all regions. Fast indexing has provided us with 
the capability of handling outside queries and sketch queries 
in real-time. 

The feature clustering process is performed only once for 
each database. The Lloyd algorithm takes about 30 minutes 



Category 


IRM 


fast IRM 


EMD2 


EMD 1 


1. Africa 


0.475 


0.472 


0.288 


0.132 


2. Beach 


0.325 


0.323 


0.286 


0.134 


3. Buildings 


0.330 


0.307 


0.233 


0.160 


4. Buses 


0.363 


0.389 


0.267 


0.108 


5. Dinosaurs 


0.981 


0.635 


0.914 


0.143 


6. Elephants 


0.400 


0.390 


0.384 


0.169 


7. Flowers 


0.402 


0.447 


0.416 


0.113 


8. Horses 


0.719 


0.669 


0.386 


0.096 


9. Mountains 


0.342 


0.335 


0.218 


0.198 


10. Food 


0.340 


0.340 


0.207 


0.114 



Table 1: The average performance for each image 
category evaluated by average precision (p), 

CPU time and results in clusters with an average of 1100 im- 
ages. Our image segmentation process generates an average 
of 4.6 regions per image. That is, on average a ‘suspect’ 
list for a query image contains at most 1100 x 4.6 = 5060 
images. 

The matching speed is fast. When the query image is in 
the database, it takes about 0.15 seconds of CPU time on 
average to sort all the images in the 200,000-image database 
using our similarity measure. This is a significant speed-up 
over the original system which runs at 1.5 second per query. 
If the query is not in the database, one extra second of CPU 
time is spent to process the query. 

Figures 4 and 5 show the results of sample queries. Due 
to the limitation of space, we show only two rows of images 
with the top 11 matches to each query. In the next section, 
we provide numerical evaluation results by systematically 
comparing several systems. 

Because of the fast indexing and retrieval speed, we allow 
the user to submit any images on the Internet cls a query 
image to the system by entering the URL of an image (Fig- 
ure 6). Our system is capable of handling any image format 
from anywhere on the Internet and reachable by our server 
via the HTTP protocol. The image is downloaded and pro- 
cessed by our system on-the-fiy. The high efficiency of our 
image segmentation and matching algorithms made this fea- 
ture possible^ . To our knowledge, this feature of our system 
is unique in the sense that no other commercial or academic 
systems allow such queries. 

3.2 Accuracy on image categorization 

We conducted extensive evaluation of the system. One 
experiment was based on a subset of the COREL database, 
formed by 10 image categories, each containing 100 pictures. 
Within this database, it is known whether any two images 
are of the same category. In particular, a retrieved image 
is considered a match if and only if it is in the same cate- 
gory as the query. This assumption is reasonable since the 
10 categories were chosen so that each depicts a distinct se- 
mantic topic. Every image in the sub-database was tested 
as a query, and the retrieval ranks of all the rest images were 
recorded. 

For each query, we computed the precision within the first 
100 retrieved images. The recall within the first 100 re- 
trieved images was not computed because it is proportional 
to the precision in this special case. The total number of se- 

^It takes some other region-based CBIR system [5] approx- 
imately 8 minutes CPU time to segment an image. 
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6276 15.76 4 16.52 4 



Figure 4; Best 11 matches of a sample query. The database contains 200,000 images from the COREL image 
library. The upper left corner is the query image. The second image in the first row is the best match. 







36933 10.90 3 






40569 23.76 7 





Figure 5; Two other query examples. 
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S-l-M-P-L-l-c-i-f-y 

Semantics-sensitive Integrated Matching for Picture Libraries 



Option 1 — > 



Image ID or URL 



http;//wwwr. Stanford. 



find similar images 



Options— > 



Option 3 — > 




Click an image to 





Figure 6: The external query interface. The best 17 matches are presented for a query image selected by 
the user from the Stanford top-level Web page. The user enters the URL of the query image (shown in the 
upper-left corner, http://www.stanford.edu/horae/pics/h-quad.jpg) to form a query. 
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Figure 7: Comparing with color histogram methods on average precision p. Color Histogram 1 gives an 
average of 13.1 filled color bins per image, while Color Histogram 2 gives an average of 42.6 filled color bins 
per image. SIMPLIcity partitions an image into an average of only 4.3 regions. 
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mantically related images for each query is fixed to be 100. 
The average performance for each image category in terms 
of the average precision is listed in Table 1, where p denotes 
precision. For a system that ranks images randomly, the 
average p is about 0.1. 

We carried out similar evaluation tests for color histogram 
match. We used LUV color space and a matching metric 
similar to the EMD described in [29] to extract color his- 
togram features and match in the categorized image 
database. Two different color bin sizes, with an average 
of 13.1 and 42.6 filled color bins per image, were evaluated. 
We call the one with less filled color bins the Color His- 
togram 1 system and the other the Color Histogram 2 sys- 
tem. Figure 7 shows the performance as compared with 
the Lloyd-based SIMPLIcity system. Clearly, both of the 
two color histogram-based matching systems perform much 
worse than the Lloyd-based system in almost all image cat- 
egories. The performance of the Color Histogram 2 system 
is better than that of the Color Histogram 1 system due 
to more detailed color separation obtained with more filled 
bins. However, the Color Histogram 2 system is so slow 
that it is impossible to obtain matches on larger databases. 
The original SIMPLIcity runs at about twice the speed of 
the faster Color Histogram 1 system and gives much bet- 
ter searching accurax:y than the slower Color Histogram 2 
system. 

The overall performance of the Lloyd-based system is close 
to that of the original system which uses IRM and area per- 
centages of the segmented regions as significant constraints. 
Both the regular IRM and the fast IRM algorithms are much 
more accurate than the EMD-based color histogram. Ex- 
periments on a database of 70,000 pathology slides demon- 
strated similar comparison results. 

3.3 Robustness 

Similar to the original SIMPLIcity system [38], the cur- 
rent system is exceptionally robust to image alterations such 
as intensity variation, sharpness variation, intentional color 
distortions, intentional shape distortions, cropping, shifting, 
and rotation. 

The system is fairly robust to image alterations such as 
intensity variation, sharpness variation, intentional color dis- 
tortions, other intentional distortions, cropping, shifting, 
and rotation. On average, the system is robust to approx- 
imately 10% brightening, 8% darkening, blurring with a 
15 X 15 Gaussian filter, 70% sharpening, 20% more satura- 
tion, 10% less saturation, random spread by 30 pixels, and 
pixelization by 25 pixels. These features are important to 
biomedical image databases because usually visual features 
of the query image are not identical to the visual features of 
those semantical ly-relevant images in the database because 
of problems such as occlusion, difference in intensity, and 
difference in focus. 

4. DISCUSSIONS 

The system has several limitations. (1) Like other CBIR 
systems, SIMPLIcity assumes that images with similar se- 
mantics share some similar features. This assumption may 
not always hold. (2) The shape matching process is not 
ideal. When an object is segmented into many regions, 
the IRM distance should be computed after merging the 
matched regions. (3) The querying interfaces are not pow- 
erful enough to allow users to formulate their queries freely. 



For different user domains (e.g., biomedicine, Web image re- 
trieval), the query interfaces should ideally provide different 
sets of functions. 

In our current system, the set of features for a particu- 
lar image category is determined empirically based on the 
perception of the developers. For example, shape-related 
features are not used for textured images. Automatic deriva- 
tion of optimal features is a challenging and important issue 
in its own right. A major difficulty in feature selection is 
the lack of information about whether any two images in 
the database match with each other. The only reliable way 
to obtain this information is through manual assessment, 
which is formidable for a database of even moderate size. 
Furthermore, human evaluation is hard to be kept consistent 
from person to person. To explore feature selection, primi- 
tive studies can be carried with relatively small databzises. 
A database can be formed from several distinctive groups of 
images, among which only images from the same group are 
considered matched. A search algorithm can be developed 
to select a subset of candidate features that provides op- 
timal retrieval according to an objective performance mea- 
sure. Although such studies are likely to be seriously biased, 
insights regarding which features are most useful for a cer- 
tain image category may be obtained. 

The main limitation of our current evaluation -results is 
that they are based mainly on precision or variations of pre- 
cision. In practice, a system with a high overall precision 
may have a low overall recall. Precision and recall often 
trade off against each other. It is extremely time-consuming 
to manually create detailed descriptions for all the images in 
our database in order to obtain numerical comparisons on 
recall. The COREL database provides us rough semantic 
labels on the images. Typically, an image is associated with 
one keyword about the main subject of the image. For ex- 
ample, a group of images may be labeled as “flower” and an- 
other group of images may be labeled as “Kyoto, Japan”. If 
we use the descriptions such as “flower” and “Kyoto, Japan” 
as definitions of relevance to evaluate CBIR systems, it is 
unlikely that we can obtained a consistent performance eval- 
uation. A system may perform very well on one query (such 
as the flower query), but very poorly on another (such as 
the Kyoto query). Until this limitation is thoroughly inves- 
tigated, the evcduation results reported in the comparisons 
should be interpreted cautiously. 

5. CONCLUSIONS AND FUTURE WORK 

We have developed a scalable integrated region-based im- 
age retrieval system. The system uses the IRM measure and 
the Lloyd algorithm. The algorithm has been implemented 
as part of the the IRM metric in our experimental SIMPLIc- 
ity image retrieval system. Tested on a database of about 
200,000 general-purpose images, the technique has demon- 
strated high efficiency and robustness. The main difference 
between this system and the previous SIMPLIcity system is 
the statistical clustering process which significantly reduces 
the computational complexity of the IRM measure. 

The clustering efficiency can be improved by using a better 
statistical clustering algorithm. Better statistical modeling 
and matching scheme is likely to improve the matching ac- 
curacy of the system. We are also planning to apply the 
methods to special image databases (e.g., biomedical), and 
very large multimedia document databases (e.g., WWW, 
video). 
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